32 research outputs found

    A Method for Mapping XML DTD to Relational Schemas In The Presence Of Functional Dependencies

    Get PDF
    The eXtensible Markup Language (XML) has recently emerged as a standard for data representation and interchange on the web. As a lot of XML data in the web, now the pressure is to manage the data efficiently. Given the fact that relational databases are the most widely used technology for managing and storing XML, therefore XML needs to map to relations and this process is one that occurs frequently. There are many different ways to map and many approaches exist in the literature especially considering the flexible nesting structures that XML allows. This gives rise to the following important problem: Are some mappings ‘better’ than the others? To approach this problem, the classical relational database design through normalization technique that based on known functional dependency concept is referred. This concept is used to specify the constraints that may exist in the relations and guide the design while removing semantic data redundancies. This approach leads to a good normalized relational schema without data redundancy. To achieve a good normalized relational schema for XML, there is a need to extend the concept of functional dependency in relations to XML and use this concept as guidance for the design. Even though there exist functional dependency definitions for XML, but these definitions are not standard yet and still having several limitation. Due to the limitations of the existing definitions, constraints in the presence of shared and local elements that exist in XML document cannot be specified. In this study a new definition of functional dependency constraints for XML is proposed that are general enough to specify constraints and to discover semantic redundancies in XML documents. The focus of this study is on how to produce an optimal mapping approach in the presence of XML functional dependencies (XFD), keys and Data Type Definition (DTD) constraints, as a guidance to generate a good relational schema. To approach the mapping problem, three different components are explored: the mapping algorithm, functional dependency for XML, and implication process. The study of XML implication is important to imply what other dependencies that are guaranteed to hold in a relational representation of XML, given that a set of functional dependencies holds in the XML document. This leads to the needs of deriving a set of inference rules for the implication process. In the presence of DTD and userdefined XFD, other set of XFDs that are guaranteed to hold in XML can be generated using the set of inference rules. This mapping algorithm has been developed within the tool called XtoR. The quality of the mapping approach has been analyzed, and the result shows that the mapping approach (XtoR) significantly improve in terms of generating a good relational schema for XML with respect to reduce data and relation redundancy, remove dangling relations and remove association problems. The findings suggest that if one wants to use RDBMS to manage XML data, the mapping from XML document to relations must based be on functional dependency constraints

    Biomedical Named Entity Recognition: A Review

    Get PDF
    Biomedical Named Entity Recognition (BNER) is the task of identifying biomedical instances such as chemical compounds, genes, proteins, viruses, disorders, DNAs and RNAs. The key challenge behind BNER lies on the methods that would be used for extracting such entities. Most of the methods used for BNER were relying on Supervised Machine Learning (SML) techniques. In SML techniques, the features play an essential role in terms of improving the effectiveness of the recognition process. Features can be identified as a set of discriminating and distinguishing characteristics that have the ability to indicate the occurrence of an entity. In this manner, the features should be able to generalize which means to discriminate the entities correctly even on new and unseen samples. Several studies have tackled the role of feature in terms of identifying named entities. However, with the surge of biomedical researches, there is a vital demand to explore biomedical features. This paper aims to accommodate a review study on the features that could be used for BNER in which various types of features will be examined including morphological features, dictionary-based features, lexical features and distance-based features

    A systematic strategy for harnessing financial information systems in fighting corruption electronically

    Get PDF
    Knowledge Management (KM) is one of the hottest topics in both of the industrialized world and information research world. In our daily life, we deal with the most related systems, which were Financial Information Systems (FIS).These systems have closely linked with all aspects of administrative organizations.As a result of this link, the KM has contributed in developing FIS as attempts to harness it as part of anti-corruption strategies.This research aims to propose a systematic strategy for harnessing FIS in fighting corruption electronically.This proposed strategy demonstrates a general perceptive during the designing and implementing phases.The research has employed a repetitive design methodology, which comprises an extensive literature review, content and website analysis.Initially, the approach explores the key concepts of FIS, corruption strategies, and the popular approaches employed to minimize corruption.Cumulatively the systematic strategy has been proposed for enabling the FIS to minimize corruption.This research suggests that the FIS should be employed heavily in the process of minimizing corruption

    Comparative Analysis of Different Data Representations for the Task of Chemical Compound Extraction

    Get PDF
    Chemical Compound Extraction refers to the task of recognizing chemical instances such as oxygen nitrogen and others. The majority of studies that addressed the task of chemical compound extraction used machine-learning techniques. The key challenge behind using machine-learning techniques lies in employing a robust set of features. In fact, the literature shows that there are numerous types of features used in the task of chemical compound extraction. Such dimensionality of features can be determined via data representation. Some researchers have used N-gram representation for biomedical-named entity recognition, where the most significant terms are represented as features. Meanwhile, others have used detailed-attribute representation in which the features are generalized. As a result, identifying the best combination of features to yield high-accuracy classification becomes challenging. This paper aims to apply the Wrapper Subset Selection approach using two data representations—N-gram and detailed-attributes. Since each data representation would suit a specific classification algorithm, two classifiers were utilized—Naïve Bayes (for detailed-attributes) and Support Vector Machine (for N-gram). The results show that the application of feature selection using detailed-attributes outperformed that of N-gram representation by achieving a 0.722 f-measure. Despite the higher classification accuracy, the selected features using detailed-attribute representation have more meaning and can be applied for further datasets

    Inferring functional dependencies for XML storage

    Get PDF
    XML allows redundancy of data with its hierarchical structure where its elements may be nested and repeated. This will make the same information appear in more than one place; in fact it allows the same elements appear at different sub-trees. With this capability, XML is easier to understand and to parse, while to recover this information would require less joins. This is in contrast to relational data for which the normalized theory has been developed for eliminating data redundancy. Therefore how to detect redundancy in XML data is important before mapping can be done. In this paper, we use functional dependencies to detect data redundancies in XML documents. Based on inferring other functional dependencies from the given ones, we proposed an algorithm for mapping XML DTDs to relational schemas. The result is a “good relational schema” in terms of reducing data redundancy and preserving the semantic constraints

    Record Duplication Detection in Database: A Review

    Get PDF
    The recognition of similar entities in databases has gained substantial attention in many application areas. Despite several techniques proposed to recognize and locate duplication of database records, there is a dearth of studies available which rate the effectiveness of the diverse techniques used for duplicate record detection. Calculating time complexity of the proposed methods reveals their performance rating. The time complexity calculation showed that the efficiency of these methods improved when blocking and windowing is applied. Some domain-specific methods train systems to optimize results and improve efficiency and scalability, but they are prone to errors. Most of the existing methods fail to either discuss, or lack thoroughness in consideration of scalability. The process of sorting and searching form an essential part of duplication detection, but they are time-consuming. Therefore this paper proposes the possibility of eliminating the sorting process by utilization of tree structure to improve the record duplication detection. This has added benefits of reducing time required, and offers a probable increase in scalability. For database system, scalability is an inherent feature for any proposed solution, due to the fact that the data size is huge. Improving the efficiency in identifying duplicate records in databases is an essential step for data cleaning and data integration methods. This paper reveals that the current proposed methods lack in providing solutions that are scalable, high accurate, and reduce the processing time during detecting duplication of records in database. The ability to provide solutions to this problem will improve the quality of data that are used for decision making process

    Diagnosing the Issues and Challenges in Data Integration Implementation in Public Sector

    Get PDF
    A reliable data is like the oxygen to the application systems. It keeps the application systems breathing and producing meaningful information for day to day operation and decision-making purpose in an organization. The increasing demand of reliable data in day to day operation sparks a big challenge in data integration implementation in many domains including public sector. Through a successful data integration implementation, a trustable non duplicate data will be provided to the stakeholders. Public sector as a domain which solely rely on creating value through services to the stakeholders critically in needs of reliable data to serve the reliable information to the stakeholders. However, there is lack of research being done in diagnosing the issues and challenges of data integration implementation in public sector. It is crucial to identify the issues and challenges so that we could come out with the best recommendation to ensure the feasibleness of data integration in public sector. Thus, this research had explored the issues and challenges in data integration implementation in public sector. Data has been collected using the qualitative method through the content analysis and expert interview approaches. Four main issues and challenges were identified, namely; (1) lack of management and organization support, (2) policy, standard and politics, (3) human resource incapability, and (4) lack of governance.  The finding of this research enlightens the issues and challenges in data integration implementation in public sector and offers the opportunity for future work to propose the solutions to overcome the identified issues and challenges

    Factors influencing interdepartmental information sharing practice in electronic government agencies

    Get PDF
    Electronic information sharing is a key to effective government.This study is conducted to investigate the factors influencing interdepartmental information sharing (IS) practice in electronic government (EG) agencies.Based on previous study and observation, the issues on electronic government and information sharing are highlighted and the influencing factors are identified. Three domains of factors that are considered in this study are individual, organizational and technological factors. This paper proposes the conceptual framework of interdepartmental information sharing for electronic government agencies in Malaysia
    corecore